Simulation Study Setup

I simulate the data in modules (blocks). Within each module, I first generate a random standard normal vector (eigengene) and then calculate the expression levels of all other genes in a module as a function of that eigengene.

In this simulation setup I have simulated 6 modules with the following proportions and environment dependent correlations :

  1. Turquoise (0.15). Correlated with Blue module regardless of environment
  2. Blue (0.15). Correlated with Turquoise module regardless of environment
  3. Red (0.15). Correlated with Green module only when E=1
  4. Green (0.15). Correlated with Red module only when E=1
  5. Yellow (0.15). Uncorrelated with any other module regardless of environment
  6. Grey (0.25). Uncorrelated within and between modules regardless of environment

I have kept the number of active variables fixed at 50. With the first 25 genes in the Green and Red modules corresponding to the active genes. All active genes have interaction terms as well. Therefore there the true model contains a 51 main effects (50 gene + 1 environment) and 50 interactions associated with the response. I generate the main effects from a \(Unif(3.9,4.1)\), the interaction effects from a \(Unif(1.9,2.1)\), and \(\beta_E = 5\).

I plan on varying the total number of genes (500, 1000, 3000), while keeping the proportions of each module fixed, and the number of active variables fixed. I also plan on varying the degree of correlation between the green and red modules when E=1 (0.1, 0.35, 0.75, 0.95).

Below I have plotted the heatmaps of several similarity matrices, and have labled the module that they truly belong to, and which of those genes are active.

Heatmaps of Correlations

All

E=0

E=1

E=1 - E=0

\(S = |\rho_{E=0} + \rho_{E=1} - 1.5 \rho|\)

Fisher’s Z Transformation